{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "ename": "SyntaxError",
     "evalue": "invalid syntax (<ipython-input-1-07de69cd5288>, line 3)",
     "output_type": "error",
     "traceback": [
      "\u001b[0;36m  File \u001b[0;32m\"<ipython-input-1-07de69cd5288>\"\u001b[0;36m, line \u001b[0;32m3\u001b[0m\n\u001b[0;31m    Atari 2600 is a popular video game console from a game company called Atari. The Atari game console provides several popular games such as pong, space invaders, Ms Pacman, break out, centipede and many more. In this section, we will learn how to build the deep Q network for playing the Atari games. First, let's understand the architecture of DQN for playing the Atari games.\u001b[0m\n\u001b[0m             ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n"
     ]
    }
   ],
   "source": [
    "# Playing Atari games using DQN \n",
    "\n",
    "Atari 2600 is a popular video game console from a game company called Atari. The Atari game console provides several popular games such as pong, space invaders, Ms Pacman, break out, centipede and many more. In this section, we will learn how to build the deep Q network for playing the Atari games. First, let's understand the architecture of DQN for playing the Atari games. \n",
    "\n",
    "## Architecture of DQN\n",
    "In the Atari environment, the image of the game screen is the state of the environment. So, we just feed the image of the game screen as an input to the DQN and it returns the Q value of all the actions in the state. Since we are dealing with the images, instead of using the vanilla deep neural network for approximating the Q value, we can use the convolutional neural network (CNN) since the convolutional neural network is very effective for handling images.\n",
    "\n",
    "Thus, now our DQN is the convolutional neural network. We feed the image of the game screen (game state) as an input to the convolutional neural network and it outputs the Q value of all the actions in the state.\n",
    "\n",
    "As shown in the below figure, given the image of the game screen (game state), the convolutional layers extract features from the image and produce a feature map. Next, we flatten the feature map and feed the flattened feature map as an input to the feedforward network. The feedforward network takes this flattened feature map as an input and returns the Q value of all the actions in the state: \n",
    "\n",
    "\n",
    "![title](Images/4.png)\n",
    "\n",
    "Note that here we don't perform pooling operation. A pooling operation is useful when we perform tasks such as object detection, image classification and so on where we don't consider the position of the object in the image and we just want to know whether the desired object is present in the image. For example, if we want to identify whether there is a dog in an image, we only look for whether a dog is present in the image and we don't check the position of the dog in the image. Thus, in this case, pooling operation is used to identify whether there is a dog in the image irrespective of the position of the dog.\n",
    "\n",
    "But in our setting, pooling operation should not be performed. Because to understand the current game screen (state), the position is very important. For example, in a Pong game, we just don't want to classify if there is a ball in the game screen. We want to know the position of the ball so that we can take better action. Thus, we don't include the pooling operation in our DQN architecture. \n",
    "\n",
    "Now that we have understood the architecture of DQN to play the Atari games, in the next section, we will start implementing them. \n",
    "\n",
    "\n",
    "\n",
    "## Getting hands-on with DQN\n",
    "\n",
    "Let's implement DQN to play the Ms Pacman game. \n",
    "\n",
    "First, let's import the necessary\n",
    "libraries:\n",
    "\n",
    "import random\n",
    "import gym\n",
    "import numpy as np\n",
    "from collections import deque\n",
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Dense, Activation, Flatten, Conv2D, MaxPooling2D\n",
    "from tensorflow.keras.optimizers import Adam\n",
    "\n",
    "Now, let's create the Ms Pacman game environment using gym:\n",
    "\n",
    "env = gym.make(\"MsPacman-v0\")\n",
    "\n",
    "Set the state size:\n",
    "\n",
    "state_size = (88, 80, 1)\n",
    "\n",
    "Get the number of actions:\n",
    "\n",
    "action_size = env.action_space.n\n",
    "\n",
    "## Preprocess the game screen\n",
    "\n",
    "We learned that we feed the game state (image of the game screen) as an input to the DQN which is the convolutional neural network and it outputs the Q value of all the actions in the state. However, directly feeding the raw game screen image is not efficient, since the raw game screen size will be 210 x 160 x 3. This will be computationally expensive.\n",
    "\n",
    "To avoid this, we preprocess the game screen and then feed the preprocessed game screen to the DQN. First, we crop and resize the game screen image, convert the image to grayscale, normalize and then reshape the image to 88 x 80 x 1. Next, we feed this pre-processed game screen image as an input to the convolutional network (DQN) which returns the Q value.\n",
    "\n",
    "Now, let's define a function called preprocess_state which takes the game state (image of the game screen) as an input and returns the preprocessed game state (image of the game screen):\n",
    "\n",
    "color = np.array([210, 164, 74]).mean()\n",
    "\n",
    "def preprocess_state(state):\n",
    "\n",
    "    #crop and resize the image\n",
    "    image = state[1:176:2, ::2]\n",
    "\n",
    "    #convert the image to greyscale\n",
    "    image = image.mean(axis=2)\n",
    "\n",
    "    #improve image contrast\n",
    "    image[image==color] = 0\n",
    "\n",
    "    #normalize the image\n",
    "    image = (image - 128) / 128 - 1\n",
    "    \n",
    "    #reshape the image\n",
    "    image = np.expand_dims(image.reshape(88, 80, 1), axis=0)\n",
    "\n",
    "    return image\n",
    "\n",
    "## Building the DQN \n",
    "\n",
    "Now, let's build the deep Q network. We learned that for playing atari games we use the\n",
    "convolutional neural network as the DQN which takes the image of the game screen as an\n",
    "input and returns the Q values.\n",
    "\n",
    "We define the DQN with three convolutional layers. The convolutional layers extract the\n",
    "features from the image and output the feature maps and then we flattened the feature map\n",
    "obtained by the convolutional layers and feed the flattened feature maps to the feedforward\n",
    "network (fully connected layer) which returns the Q value:\n",
    "\n",
    "class DQN:\n",
    "    def __init__(self, state_size, action_size):\n",
    "        \n",
    "        #define the state size\n",
    "        self.state_size = state_size\n",
    "        \n",
    "        #define the action size\n",
    "        self.action_size = action_size\n",
    "        \n",
    "        #define the replay buffer\n",
    "        self.replay_buffer = deque(maxlen=5000)\n",
    "        \n",
    "        #define the discount factor\n",
    "        self.gamma = 0.9  \n",
    "        \n",
    "        #define the epsilon value\n",
    "        self.epsilon = 0.8   \n",
    "        \n",
    "        #define the update rate at which we want to update the target network\n",
    "        self.update_rate = 1000    \n",
    "        \n",
    "        #define the main network\n",
    "        self.main_network = self.build_network()\n",
    "        \n",
    "        #define the target network\n",
    "        self.target_network = self.build_network()\n",
    "        \n",
    "        #copy the weights of the main network to the target network\n",
    "        self.target_network.set_weights(self.main_network.get_weights())\n",
    "        \n",
    "\n",
    "    #Let's define a function called build_network which is essentially our DQN. \n",
    "\n",
    "    def build_network(self):\n",
    "        model = Sequential()\n",
    "        model.add(Conv2D(32, (8, 8), strides=4, padding='same', input_shape=self.state_size))\n",
    "        model.add(Activation('relu'))\n",
    "        \n",
    "        model.add(Conv2D(64, (4, 4), strides=2, padding='same'))\n",
    "        model.add(Activation('relu'))\n",
    "        \n",
    "        model.add(Conv2D(64, (3, 3), strides=1, padding='same'))\n",
    "        model.add(Activation('relu'))\n",
    "        model.add(Flatten())\n",
    "\n",
    "\n",
    "        model.add(Dense(512, activation='relu'))\n",
    "        model.add(Dense(self.action_size, activation='linear'))\n",
    "        \n",
    "        model.compile(loss='mse', optimizer=Adam())\n",
    "\n",
    "        return model\n",
    "\n",
    "    #We learned that we train DQN by randomly sampling a minibatch of transitions from the\n",
    "    #replay buffer. So, we define a function called store_transition which stores the transition information\n",
    "    #into the replay buffer\n",
    "\n",
    "    def store_transistion(self, state, action, reward, next_state, done):\n",
    "        self.replay_buffer.append((state, action, reward, next_state, done))\n",
    "        \n",
    "\n",
    "    #We learned that in DQN, to take care of exploration-exploitation trade off, we select action\n",
    "    #using the epsilon-greedy policy. So, now we define the function called epsilon_greedy\n",
    "    #for selecting action using the epsilon-greedy policy.\n",
    "    \n",
    "    def epsilon_greedy(self, state):\n",
    "        if random.uniform(0,1) < self.epsilon:\n",
    "            return np.random.randint(self.action_size)\n",
    "        \n",
    "        Q_values = self.main_network.predict(state)\n",
    "        \n",
    "        return np.argmax(Q_values[0])\n",
    "\n",
    "    \n",
    "    #train the network\n",
    "    def train(self, batch_size):\n",
    "        \n",
    "        #sample a mini batch of transition from the replay buffer\n",
    "        minibatch = random.sample(self.replay_buffer, batch_size)\n",
    "        \n",
    "        #compute the Q value using the target network\n",
    "        for state, action, reward, next_state, done in minibatch:\n",
    "            if not done:\n",
    "                target_Q = (reward + self.gamma * np.amax(self.target_network.predict(next_state)))\n",
    "            else:\n",
    "                target_Q = reward\n",
    "                \n",
    "            #compute the Q value using the main network \n",
    "            Q_values = self.main_network.predict(state)\n",
    "            \n",
    "            Q_values[0][action] = target_Q\n",
    "            \n",
    "            #train the main network\n",
    "            self.main_network.fit(state, Q_values, epochs=1, verbose=0)\n",
    "            \n",
    "    #update the target network weights by copying from the main network\n",
    "    def update_target_network(self):\n",
    "        self.target_network.set_weights(self.main_network.get_weights())\n",
    "\n",
    "## Training the network\n",
    "\n",
    "Now, let's train the network. First, let's set the number of episodes we want to train the\n",
    "network:\n",
    "\n",
    "num_episodes = 500\n",
    "\n",
    "Define the number of time steps\n",
    "\n",
    "num_timesteps = 20000\n",
    "\n",
    "Define the batch size:\n",
    "\n",
    "batch_size = 8\n",
    "\n",
    "Set the number of past game screens we want to consider:\n",
    "\n",
    "num_screens = 4     \n",
    "\n",
    "Instantiate the DQN class\n",
    "\n",
    "dqn = DQN(state_size, action_size)\n",
    "\n",
    "done = False\n",
    "time_step = 0\n",
    "\n",
    "#for each episode\n",
    "for i in range(num_episodes):\n",
    "    \n",
    "    #set return to 0\n",
    "    Return = 0\n",
    "    \n",
    "    #preprocess the game screen\n",
    "    state = preprocess_state(env.reset())\n",
    "\n",
    "    #for each step in the episode\n",
    "    for t in range(num_timesteps):\n",
    "        \n",
    "        #render the environment\n",
    "        env.render()\n",
    "        \n",
    "        #update the time step\n",
    "        time_step += 1\n",
    "        \n",
    "        #update the target network\n",
    "        if time_step % dqn.update_rate == 0:\n",
    "            dqn.update_target_network()\n",
    "        \n",
    "        #select the action\n",
    "        action = dqn.epsilon_greedy(state)\n",
    "        \n",
    "        #perform the selected action\n",
    "        next_state, reward, done, _ = env.step(action)\n",
    "        \n",
    "        #preprocess the next state\n",
    "        next_state = preprocess_state(next_state)\n",
    "        \n",
    "        #store the transition information\n",
    "        dqn.store_transistion(state, action, reward, next_state, done)\n",
    "        \n",
    "        #update current state to next state\n",
    "        state = next_state\n",
    "        \n",
    "        #update the return\n",
    "        Return += reward\n",
    "        \n",
    "        #if the episode is done then print the return\n",
    "        if done:\n",
    "            print('Episode: ',i, ',' 'Return', Return)\n",
    "            break\n",
    "            \n",
    "        #if the number of transistions in the replay buffer is greater than batch size\n",
    "        #then train the network\n",
    "        if len(dqn.replay_buffer) > batch_size:\n",
    "            dqn.train(batch_size)\n",
    "\n",
    "\n",
    "Now that we have learned how deep Q network works and how to build the DQN to play the Atari games, in the next section, we will learn an interesting variant of DQN called\n",
    "double DQN. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
